A Dynamically Reconfigurable Model for a Distributed Web Crawling System

نویسندگان

  • Hongfei Yan
  • Jianyong Wang
  • Xiaoming Li
چکیده

A web crawling system using a distributed architecture needs to coordinate the whole system when the nodes in the system change. This paper presents an efficiently dynamic reconfigurability model that can be used in such a system. Through analyzing the model, we got methods to achieve the optimized performance in the distributed web crawling system, i.e., retain load balance and produce low network traffic in the system. Currently this dynamic reconfigurability model is being introduced in perfecting WebGather, a well-known Chinese and English web search engine. In addition, we believe that the model can also be useful in other web crawling system adopting a distributed architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A maintenance system model for optimal reconfigurable vibrating screen management

The reconfigurable vibrating screen (RVS) machine is an innovative beneficiation machine designed for screening different mineral particles of varying sizes and volumes required by the customers’ through the geometric transformation of its screen structure. The successful RVS machine upkeep requires its continuous, availability, reliability and maintainability. The RVS machine downtime, which c...

متن کامل

Towards Distributed Web Mining in Net-Enabled Enterprises

In today’s information age, web sites have become an important source for business information collection and analysis. They provide a company abundant information for competitor analysis and business intelligence. Also, web mining on a firm’s intranet can greatly assist a firm’s endeavor in knowledge management of a firm. However, web mining is a complex and resource-consuming process that con...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Service Description in a Distributed Search and Advertising System

Service description in a distributed system allows system components to know about one another and to make intelligent decisions regarding request routing and propagation. The paper discusses service description model used in a distributed system for Web search and search-based advertising. A service description consists of content description (terms), attribute set, and a number of service par...

متن کامل

Scale-Adaptable Recrawl Strategies for DHT-Based Distributed Web Crawling System

Large scale distributed Web crawling system using voluntarily contributed personal computing resources allows small companies to build their own search engines with very low cost. The biggest challenge for such system is how to implement the functionalities equivalent to that of the traditional search engines under a fluctuating distributed environment. One of the functionalities is incremental...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001